SVM

Before moving forward with the to-do list, let’s throw a Random Forest to it.

SVM

For many reasons, Random Forest is usually a very good baseline model. In this particular case I started with the polynomial OLS as baseline model, just because it was so evident from the correlations that the relationship between temperature and consumption follows a polynomial shape. But let’s go back to a beloved RF.

/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_classes.py:31: FutureWarning:

The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.

/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_base.py:1237: ConvergenceWarning:

Liblinear failed to converge, increase the number of iterations.

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'wd-svm'
Version: 20250226T185846Z-b6de7
⏩ stepit 'svm_raw': Starting execution of `strom.modelling.assess_model()` 2025-02-26 18:58:46
/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_classes.py:31: FutureWarning:

The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.

/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_base.py:1237: ConvergenceWarning:

Liblinear failed to converge, increase the number of iterations.

⏩ stepit 'get_single_split_metrics': Starting execution of `strom.modelling.get_single_split_metrics()` 2025-02-26 18:58:46
✅ stepit 'get_single_split_metrics': Successfully completed and cached [exec time 0.0 seconds, cache time 0.0 seconds, size 1.0 KB] `strom.modelling.get_single_split_metrics()` 2025-02-26 18:58:46
♻️  stepit 'cross_validate_pipe': is up-to-date. Using cached result for `strom.modelling.cross_validate_pipe()` 2025-02-26 18:58:46
✅ stepit 'svm_raw': Successfully completed and cached [exec time 0.1 seconds, cache time 0.0 seconds, size 12.7 KB] `strom.modelling.assess_model()` 2025-02-26 18:58:46

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.759698 2.527988 3.087172 3.141476
MSE - Mean Squared Error 17.777598 19.129205 15.989639 21.179925
RMSE - Root Mean Squared Error 4.216349 4.373695 3.781907 4.580536
R2 - Coefficient of Determination 0.815390 0.804477 -7.118546 0.784520
MAPE - Mean Absolute Percentage Error 0.311556 0.288694 0.660358 0.254285
EVS - Explained Variance Score 0.818426 0.805457 -1.855572 0.824080
MeAE - Median Absolute Error 2.164015 1.702269 2.542521 2.327501
D2 - D2 Absolute Error Score 0.612945 0.658054 -1.968504 0.553734
Pinball - Mean Pinball Loss 1.379849 1.263994 1.543586 1.570738

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

Well, not that bad, but it is overfitting quite a lot.

♻️  stepit 'grid_search_pipe': is up-to-date. Using cached result for `strom.modelling.grid_search_pipe()` 2025-02-26 18:58:50
Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'wd-svm'
Version: 20250226T185850Z-46ac4
⏩ stepit 'svm_tuned': Starting execution of `strom.modelling.assess_model()` 2025-02-26 18:58:50
/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_classes.py:31: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
⏩ stepit 'get_single_split_metrics': Starting execution of `strom.modelling.get_single_split_metrics()` 2025-02-26 18:58:50
✅ stepit 'get_single_split_metrics': Successfully completed and cached [exec time 0.0 seconds, cache time 0.0 seconds, size 1.0 KB] `strom.modelling.get_single_split_metrics()` 2025-02-26 18:58:50
♻️  stepit 'cross_validate_pipe': is up-to-date. Using cached result for `strom.modelling.cross_validate_pipe()` 2025-02-26 18:58:50
✅ stepit 'svm_tuned': Successfully completed and cached [exec time 0.1 seconds, cache time 0.0 seconds, size 12.6 KB] `strom.modelling.assess_model()` 2025-02-26 18:58:50

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.341431 2.263821 2.147532 2.581758
MSE - Mean Squared Error 15.653589 17.697927 7.877316 17.898406
RMSE - Root Mean Squared Error 3.956462 4.206890 2.711544 4.228775
R2 - Coefficient of Determination 0.837447 0.819106 -1.935593 0.817129
MAPE - Mean Absolute Percentage Error 0.184828 0.174699 0.474445 0.178073
EVS - Explained Variance Score 0.839705 0.842886 -1.089626 0.820615
MeAE - Median Absolute Error 1.497370 1.403049 1.790379 1.581437
D2 - D2 Absolute Error Score 0.671608 0.693786 -0.894147 0.631738
Pinball - Mean Pinball Loss 1.170716 1.131910 1.073766 1.290879

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

TODOs